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Abstract 

For an error-correcting code and a distance bound, the list decoding problem is to compute 
all the codewords within the given distance to a received message. The bounded distance 
decoding problem, on the other hand, is to find one codeword if there exists one or more 
codewords within the given distance, or to output the empty set if there does not. Obviously 
the bounded distance decoding problem is not as hard as the list decoding problem. For a 
Reed-Solomon code f simple counting argument shows that for any integer g < n, 

there exists at least one Hamming ball of radius n — g, which contains at least m any 

codewords. Let g(n,k,q) be the smallest integer g such that < 1< For the distance 

bound between n — \ nk and n — g(n, k, q), we do not know whether the Reed-Solomon code 
is list, or bounded distance decodable, nor do we know whether there are polynomially many 
codewords in all balls of the radius. It is generally believed that the answers to both questions 
are no. There are public key cryptosystems proposed recently, whose security is based on 
the assumptions. In this paper, we prove: (1) List decoding can not be done for radius 
n — g(n, k, q) or larger, otherwise the discrete logarithm over ¥ q ^ n , *,«)-* is easy. (2) Let h be 
a positive integer satisfying h < q 1 ^ 4 — 2. We show that the discrete logarithm problem over 
F q h can be efficiently reduced to the bounded distance decoding problem of the Reed-Solomon 
code [q, 3h + 4] q with radius q — Ah — A. These results show that the decoding problems for the 
Reed-Solomon code are at least as hard as the discrete logarithm problem over finite fields. 
The main tools to obtain these results are an interesting connection between the problems of 
list-decoding of Reed-Solomon code and the problems of discrete logarithms over finite fields, 
and a generalization of the Katz's theorem, which concerns representations of elements in an 
extension finite field by products of linear factors. 

1 Introduction and Motivation 

An error-correcting code C over an alphabet E is an injective map <f) : S fc — > S n . When we need 
to transmit a message of k letters over a noisy channel, we apply the map on the message first ( 
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i.e. encode the message ) and send its image (i.e. the codeword) of n letters over the channel. The 
Hamming distance between two sequence of letters of the same length is the number of positions 
where two sequences differ. A good error-correcting code should have a large minimum distance 
d, which is defined to be the minimum Hamming distance between any two codewords in </>(£ fc ). 
A received message, possibly corrupted, but with no more than [d — l)/2 errors, corresponds to a 
unique codeword, thus may be decoded into the original message despite errors occur during the 
communication. 

Error-correcting codes are widely used in practice and are mathematically interesting and 
intriguing. It attracts the attention of theoretical computer science community recently. Sev- 
eral major achievements of theoretical computer science, notably the Probabilistically Checkable 
Proofs and derandomization techniques, rely heavily on the techniques in error-correcting codes. 
We refer to the survey |14j for details. 

For the purpose of efficient encoding and decoding, £ is usually set to be a finite field, and 
the map <p is set to be linear. Numerous error correcting codes have been proposed, among them, 
the Reed-Solomon codes are particularly important. They were deployed to transmit information 
from and to spaceships, and were used to store information in optical media. The Reed-Solomon 
code [n, k] q , is the map from aQ, a\, ■ ■ ■ , a^-i G F q to (do + a\x + • • • + a k-ix k l ) xe scF ^ or some 
IS" | = n. (The choice of S will not affect our results in this paper. ) Since any two different 
polynomials with degree k — 1 can share at most k — 1 points, the minimum distance of the 
Reed-Solomon code is n — k + 1. If the radius of a Hamming ball is less than half of the minimum 
distance, there should be at most one codeword in the Hamming ball. Finding the codeword is 
called unambiguous decoding. It was solved, see [5] for a simple algorithm. 

If we gradually increase the radius, there will be two or more codewords lying in some Hamming 
balls. Can we efficiently enumerate all the codewords in any Hamming ball of certain radius? This 
is the so called list decoding problem. The notion was first introduced by Elias [5]. There was 
virtually no progress on this problem for radius slightly larger than half of the minimum distance, 
until Sudan published his influential paper His result was subsequently improved, the best 
algorithm [Hj solves the list decoding problem for radius as large as n — Vnk. The work sheds new 
light on the limitation of list decodibility of Reed-Solomon codes. To the other extreme, if the 
radius is greater than or equal to the minimum distance, there are exponentially many codewords 
in some Hamming balls. 

The decoding problem of Reed-Solomon codes can be formulated into the problem of curve fit- 
ting or polynomial reconstruction. In the problem, we are given n points (xi,y{), (x2,y2), ■ • • , (x n , y n ). 
The goal is to find polynomials of degree k — 1 that pass at least g points. In this paper, we only 
consider the case when points have distinct x-coordinates. If we allow multiple occurrences of 
^-coordinates, the problem is NP-hard jS], and it is not relevant to the Reed-Solomon decoding 
problem. If g > (n + k)/2, it corresponds to the unambiguous decoding of Reed-Solomon codes. 
If g > \/nk, the radius is less than n — \/rik, the problem can be solved by the Guruswami-Sudan 
algorithm. If g < k, it is possible that there are exponentially many solutions, but finding one is 
very easy. 

In this paper, we study the following question: How large can we increase the radius before the 
list decoding problem or the bounded distance decoding problem become infeasible? The question 
has been under intensive investigations for Reed-Solomon codes and other error-correcting codes. 
The case of general non-linear codes has been solved |BJ. The case for linear codes is much harder. 
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Some partial results have been obtained in [SI [7]. However, none of them applies to Reed-Solomon 
codes. No negative result is known about the list decodibility of Reed-Solomon codes, except a 
simple bound given by Justesen and Hoholdt JUj, which states that for any positive integer g < n, 
there exists at least one Hamming ball of radius n — g, which contains at least Q) /q 9 ~ k many 
codewords. This bound matches the intuition well, consider an imaginary algorithm as follows: 
randomly select g points from the n input points, and use polynomial interpolation to get a 
polynomial of degree at most g — 1 which passes these g points. Then with probability l/q 9 ~ k , 
the result polynomial has degree k — 1. The sample space has size Qj. Thus heuristically, the 
number of codewords in Hamming balls of radius n — g is at least Q) jq 9 ~ k on the average. In 
the same paper, Justesen and Hoholdt also gave an upper bound for the radius of the Hamming 
balls containing a constant or less number of codewords. 

If we gradually increase g, starting from k, then ( r ^)/q 9 ~ k will fall below 1 at some point. 
However, g is still very far away from ynk. Let g(n,k,q) be the smallest integer such that 
(™)/q g ~ k is less than 1. The following lemma shows that there is a gap between g(n,k,q) and 

y/ nk. 

Lemma 1 1. For positive integers k < g < n, if g > ynk, then n 9 ~ k > (™) (which implies 
that q 9 - k > Q). 

2. For any constant < c\ < 1/2 and fixed k/n, if g = k + ci(n — k), then (™) jn 9 ~ k < 2~ C2U 
for some positive constant C2- 

In fact, for a fixed rate (k/n) and q = @(n), g(n, k,q) = k + We prove that if the list 

decoding of the [n, k] q Reed-Solomon code is feasible when radius is n — g(n, k, q), then the discrete 
logarithm over F g §(n,*,?)-* is easy. In the other words, we prove that the list decoding is not feasible 
for radius n — g(n, k, q) or larger, assuming that the discrete logarithm over F §( n ,k, q )-k is hard. 
Note that it does not rule out the possibility that there are only polynomially many codewords 
in all Hamming balls of radius n — g(n,k,q), even assuming that intractability of the discrete 
logarithm over F 

Theorem 1 If there exists an algorithm solving the list decoding problem of radius n — g(n, k, q) 
for the Reed-Solomon code [n, k] q in time q°^~> , then discrete logarithm over finite field F q §( n ,k,q)-k 
can be computed in time q°^\ 

When the list decoding problem is hard for certain radius, or a Hamming ball contains too 
many codewords for us to enumerate all of them, we can turn our attention to designing an 
efficient bounded distance decoding algorithm, which only need to output one of codewords in the 
ball, or output the empty set in case that the ball does not contain any codeword. However, we 
prove that the bounded distance decoding is hard as well. 

Theorem 2 Let q be a prime power and h be a positive integer satisfying q > (h + 2) 4 . If the 
bounded distance decoding problem of radius q — Ah — 4 for the Reed-Solomon code [q, 3h + 4] g can 
be solved in time q°^> , the discrete logarithm problem overF q h can be solved in time q°^ . 
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To prove the theorem, we naturally come across the following question: In a finite field F q h, 
for any a such that F h = F q [a], can F q + a generate the multiplicative group (F g h)*? This 
interesting problem has a lot of applications in graph theory, and it has been studied by several 
number theorists. Chung [I] proved that if q > (h — l) 2 , then (F q h)* is generated by F q + a. Wan 
[TS] showed a negative result that if q h — 1 has a divisor d > 1 and h > 2(q\og q d + \og q {q + 1)), 
then (F q h)* is not generated by F q + a for some a. Katz 11 applied the Lang-Weil method, 
and showed that for every h > 2 there exists a constant B{h) such that for any finite field F q 
with q > B(h), any element in (F g h)* can be written as a product of exactly n = /i + 2 distinct 
elements from F q + a. Clearly B(h) has to be an exponential function. In this paper, we obtain 
a generalization of the Katz's theorem, in which we use a bigger n and manage to decrease B{h) 
to a polynomial function. For details, see Section [3.21 

It is generally believed that the list decoding problem and the bounded distance decoding for 
Reed-Solomon codes are computationally hard if the number of errors is greater than n — \fnk and 
less than n — k. This problem is even used as a hard problem to build public key cryptosystems 
and pseudorandom generators |12j . A similar problem, noisy polynomial interpolation was 
proved to be vulnerable to the attack of lattice reduction techniques, hence is easier than originally 
thought. This raises concerns on the hardness of polynomial reconstruction problem. Our results 
confirm the belief that polynomial reconstruction problem is hard, under a well-studied hardness 
assumption in number theory, hence provide a firm foundation for many protocols based on the 
problem. 

This paper is organized as follows. In Section |21 we prove Lemma ^ In Section EH we sketch 
the proof of Theorem Q and Theorem [2j In Section 01 we show an interesting duality between the 
size of a group generated by linear factors, and the list size in Hamming balls of Reed-Solomon 
codes. 



2 Proof of Lemma Q] 

In this section, we prove Lemma ^ by showing the following statement. 
Theorem 3 There is no positive integral solution for 

(;) > « 

g > y/n(g-h). (2) 

We first obtain a finite range for h, g and n. 

Lemma 2 If (n, g, h) is a positive integral solution, then h < 88. 

Proof: Denote g/h by a and n/h by j3. From g > \/n(g — h), we have a > \J (3(a — 1). Hence 
a < f3 < a + l + ^y. 

Recall that for any positive integer i, ^J2iri(i/e) 1 <i\< v / 2~7ri(i/e) i (l + 12 j_ 1 )■ 
(n\ _ //3h\ < I xh 

\g) — \ah) — y a a (f3-af- a > ' 

Thus - wnich implies 
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b < p- 1 

- a a {[3 - a)P~ a ' 

Recall some facts: 

1. For x > 0, x x takes the minimum value 0.6922.. at x = e" 1 = 0.36787944.... 

2. For x > 0, 1 < (1 + \) x < e = 2.7182818284... 
If a > 2, then (3 - a < 1 + ^ < 2. We have 

h < 

< 1.45(1 + « + ^ T ) ( " + ^ ) 

o" 

< 1.45(l + a + ^-) ( ^ ) (l + -+ 1 ) a 

a — 1 a a(a — 1) 

< 1.45*4*e*2 < 32. 

If a < 2, h < (ff^L!, • There are two cases. If /3 < 3, then 

/i < 1.45 2 * 9 < 19. 



If /3 > 3, then 



h < 1.45(-^— ^-^/J-a)"- 1 

P — Q 

1.45 * e 3 * 3 < 88. 



□ 



Corollary 1 a > 88/87 and - a < 88. 

Note that if a < 89, then (3 < 178. If a > 89, then (3 - a < 1 + 1/88, but n - 5 = (/? - a)/i is 
an integer, and h < 87, so (3 — a < 1. So if n > 2/i, (1) can not hold. 

Proof: Now we can finish proving the main theorem of this section, by exhaustively searching 
for the solutions in the finite range that h < 88, n < 178 * 88 = 15664 and h < g < n in a 
computer. □ 



Similarly we can show that for any constant c, the inequalities 

' A > n h ' (3) 



g > Vt(g^h) (4) 

have only finite many positive integral solutions. 

Denote by 7 and by 5. To prove the second part of the lemma, it suffices to see that 

(g) = (%-fc)) — c 2, k f° r some constant C2 only depending on a and (3. 
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3 The Decoding Problem of Reed-Solomon Codes and the Dis- 
crete Logarithm over Finite Fields 

Let q be a prime power and let F q be the finite field with q elements. Let S be a subset of F q of 
n elements. For a positive integer g < n, consider 

S g = {A\ACS,\A\=g}. 

For any A € S g , denote HaeA^ — a ) by Pa( x )- Let h(x) be an irreducible monic polynomial over 
Fq of degree h < g. Define a map -ip : S g — > F q [x]/(h(x)) by 

ip{A) = P A {x) (mod h(x)). 

For any f(x) in F q [x]/(h(x)), if ■ip~ 1 (f(x)) is not empty, then there exists at least one polynomial 
t(x) and one A S S n such that f(x) + t(x)h(x) = Pa{x). For any a £ A, Pa(cl) = 0, t(a) = 
—f(a)/h(a). Hence there are at least g elements in S which are the roots of f(x) + t(x)h(x) = 0, 
and the curve y = t{x) passes at least g points in the following set of n points: 

{(a,-f(a)/h(a))\aeS}. 

According to Pigeonhole principle, there must exist a polynomial f(x) such that > 

(") 

IS^I/lFqfxj/^x))! = -7 r . Note that t(x) has degree g — h and leading coefficient 1. For any 
polynomial / £ F q [x] of degree at most h—1, let Tjm be the set of polynomial t(x) of degree g — h 
such that f(x) +t{x)h{x) = Pa(x) for some A £ S g , and let C/( x ) be the set of codewords within 
distance of n — g to the received word (—f(a)/h(a) — a 9 ~ h ) a es m Reed-Solomon code [n,g — h] q . 
There is a one-to-one correspondence between Tft x \ and Cf( x \, by sending any t(x) G Tf(x) to 
(t(a) - a9- h ) aes . 

Suppose that we know f(x) and h(x), but not A, are we still able to find This is just a 

list decoding problem of Reed-Soloman code [n,g — h] q . Once we have a list of t(x), we can find 
A by factoring f(x) + t(x)h(x). This provides a general framework for the following proofs. 

3.1 The proof of Theorem [l] 

Given a Reed-Solomon code [n, k] q , let h = g(n,k,q) — k. Recall that g(n,k,q) is the smallest 
integer such that ( r g i )/q 9 ~ k is less than 1, and h is the degree of an irreducible polynomial h(x). 
We show that there is an efficient algorithm to solve the discrete logarithm over F q g( n ,k, q )-k = 
F q [x]/(h(x)) if there is efficient list decoding algorithm for the Reed-Solomon code [n, k] q with 
radius n — g(n, k, q). Let a = x (mod h(x)). Suppose that we are given the base 6(a) and we need 
to find out the discrete logarithm of t(a) with respect to the base, where b and t are polynomials 
over F q of degree at most h — 1. That there is an efficient list decoding algorithm implies: 

1. There are only polynomially many codewords in any Hamming ball of radius n — g(n,k,q), 
which in turn implies that |V' _1 (/)| < q c for any f £ F q h and a constant c. Hence 

mS g{ n,k,,))\ > = e(g*("'*-')-V? c ) = @(q h /q c ). 
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2. And they can be found in polynomial time. 

We use the index calculus algorithm with factor bases (a + a) a £s- If we- randomly select an integer 
i between and q9( n < k 'i)- k — 1 ; then with probability bigger than l/q c , ifj~ 1 (b(a) 1 ) is not empty. 
Apply the list decoding algorithm, we get relations 

b(a) 1 = f{a) = ]J (a + a) = • • • = JJ (a + a) 

aeA 1 aeAi 

for some A\, A2, ■ ■ ■ , A\ € S§< n fr,q) where I is the list size. From the relations, we get linear 
equations. 

i = log 6 (a + o) = • • • = l °Sb(a + a) (mod q S(^,g)-k _ ^ 

We repeat the above procedure. Since i is picked randomly, and S g is the sample space, the 
probability that the new equation is linear independent to the previous ones is very high at the 
beginning of the algorithm. It would not take long time before we get n independent equations. 
Solving the system of equations gives us log fc (a + a) for all a G F q . 

In the last step, for a random i, we compute b(a) l t(a). If ip~ 1 (b(a)' l t(a)) is not empty, we can 
solve log fe t immediately. This proves the main theorem. 

3.2 The proof of Theorem [U 

Theorem 4 Let q be a prime power and let h be a positive integer. If q > (h + 2) 4 , then every 
element in F* h can be written as a product of exactly 4h + 4 distinct factors from {a + a\a £ F q }, 
for any a such that F q (a) = F q h. 

Proof: We thank Chaohua Jia for helpful discussion on the proof of this theorem. Fix an a 
such that F q (a) = F q h. For j3 E F* h , let N^{(5) denote the number of solutions of the equation 

k 

= JJ(a + Oj), Oj G F g , 
i=i 

where the aj's are distinct. We need to show that for k = Ah + 4, the number Nk(/3) is always 
positive if q > {h + 2) 4 . 

Let G be the character group of the multiplicative group F* h , which is a cyclic group of order 
q h — 1. A simple inclusion-exclusion argument shows that 

k 

w)>^( E E E )Ex~V)x(II(« 

aieF q ,l<i<k 1 < i i<*2< fe a i eF, 1 ,ai 1 =ai 2 X eG i=1 
For non-trivial x ; one nas the well-known Weil estimate 

I Yl X(a + a)\ < {h-l)y/q. 
qgF, 
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We deduce that 

NM > qk ~ q ^ - (1 + (*) )(h - 1) V /2 - 
In order for iVfc(/3) > 0, it suffices to have the inequality 

This inequality is clearly satisfied if both q > 2(1) + 1 = k(k — I) + 1 and q k / 2 -^-h > (h _ 
These two inequalities are satisfied if we take k = Ah + A and q > (/i + 2) 4 . The theor em is proved. 

□ 

Now we are ready to prove Theorem 12 

Proof: Let h(x) be an irreducible polynomial over F q of degree h. Then F q h = F q [x] / (h(x)) . 
Denote x (mod h(x)) as a. Suppose we need to solve the discrete logarithm of t(a) base 6(a) in 
F q h, where b and t are polynomials of degree at most h — 1. We let S = F q . 

(F q ) 4h+i = {A\AQF q ,\A\ =4h + A}. 

First we randomly select an integers i between and q h — l. Compute b(a) 1 , and let /(a) be the 
result where f(x) is a polynomial of degree at most h—1. Now run the bounded distance decoding 
algorithm on the Reed-Solomon code [q, 3h + 4] 9 with the point set {(a, —f(a)/h(a) — a 3h+A )\a E 
F q } and the distance bound q — 4h — 4. Then according to Theorem 0J the answer is not the 
empty set. Let the answer be t(x) — x 3h+4 . The polynomial t[x) has degree 3h + 4, and agrees 
with {(x, — f(x)/h(x))\x E F q } at 4/t + 4 many points or more. The polynomial f(x) + t(x)h(x) 
has degree at most 4/i + 4, but has at least 4h + 4 many distinct zeros, thus it will be completely 
splitted as a product of linear factors. Let f(x) + t{x)h{x) = Yl ae A( xJl ~ a ) f° r some A E (F g )4^ + 4. 
Write it in another way, 

V = H(a + a). 

aeA 

We get 

i = log 9 (a + a) (mod g' 1 — I). 

aeA 

However, we may not be able to solve \og g (a + a) for all a E F 9 , since the latter relations 
may be linearly dependent on the former relations. This is the case, for instance, when all the 
AiS come from a subset of F q . After we detect that, we start to compute t(a)b(a) x , and find 
its representation of product of linear factors. Any linear dependence will give us the discrete 
logarithm of t(a) base b(a). □ 



4 Group Size and List Size 

Let q be a prime power, and S be a subset of F q of n elements, where n is very small compared 
to q. Let a be an element in F q h such that F q [a] = F q h. What is the order of the subgroup 
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generated by a + S for some 5CF ? ? This question has an important application in analyzing 
the performance of the AKS primality testing algorithm Experimental data suggests that 
the order is greater than q h l c for some absolute constant c for \S\ > h\ogq. If we can prove it, 
the space complexity of the AKS algorithm can be cut by a factor of \ogp {p is the input prime 
whose primality certificate is sought), which will make (the random variants of ) the algorithm 
comparable to the primality proving algorithm used in practice. However, the best known lower 
bound is (c\S\/h) h for some absolute constant c [15! ■ We discover an interesting duality between 
the group size and the list size in Hamming balls of certain radius. 

Theorem 5 Let k,n be positive integers and q be a prime power. One of the following statements 
must be true. 

1. For any constant c\, there exists a Reed-Solomon code [n,k] q (n/3 < k < n/2), and a 
Hamming ball of radius n — g(n,k,q) containing more than cil.9 n codewords. 

2. Let s = log q, the group generated by a + S , has cardinality at least q h / C2 for some absolute 
constant ci, where S <^F q and \S\ = slog (7. 

To prove the first statement would solve an important open problem in the Reed-Solomon 
codes. To prove the second statement would give us a primality proving algorithm much more 
efficient in term of space complexity than the original AKS and its random variants, hence make 
the AKS algorithm not only theoretical interesting, but also practical important. However, at 
this stage we cannot figure out which one is true. What we can prove, however, is that one of 
them must be true. Note that it is also possible that both of the statements are true. 

Proof: Let s = log q, k = sh/2 — h and n = sh. So the rate k/n is very close to 1/2 as s gets 
large, and g(n,k,q) = sh/2. Assume the first statement is wrong, this means that there exists 
a constant C3 such that for any Reed-Solomon code [n, k] q with n/3 < k < n/2, the number of 
codewords in any Hamming ball of radius n — g(n, k, q) is less than C3i.9 n . The number of balls 
containing at least one codeword with that radius and center point at (—f(a)/h(a) — o- k ) a€ s^F > 
where / G F q [x] has degree less than h is greater than 

q h /(c 3 1.9 n ) = ^-"l°gi-9/logg/ C3 > gA/c 
which is a low bounded of the size of the group generated by a + S. □ 



5 Concluding Remarks 

Interesting open questions include whether the decoding problem of Reed-Solomon code is equiv- 
alent to or harder than the discrete logarithm over finite fields, and whether there exists a poly- 
nomial time quantum algorithm to solve the decoding problem of Reed-Solomon code. 
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